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In this paper we study a Tikhonov-type method for ill-posed nonlinear operator 
equations = F (u^) where is an integrable, non- negative function. We assume 
that data are drawn from a Poisson process with density tg^ where t > may be 
interpreted as an exposure time. Such problems occur in many photonic imaging 
applications including positron emission tomography, confocal fluorescence microscopy, 
\^ ' astronomic observations, and phase retrieval problems in optics. Our approach uses 

a Kullback-Leibler-type data fidelity functional and allows for general convex penalty 
' terms. We prove convergence rates of the expectation of the reconstruction error under 

. a variational source condition as i — ?■ oo both for an a priori and for a Lepskii-type 

04 ' parameter choice rule. 



1. Introduction 



H ■ We consider inverse problems where the ideal data can be interpreted as a photon density g^ €E 



L^(M[) on some manifold M. The unknown will be described by an element of a subset *8 of a 
Banach space X, and and g^ are related by a forward operator F mapping from *8 to L^( 



F(^t)^gt. (1) 

The data will be drawn from a Poisson process with density tg^ where t > can often be interpreted 
as an exposure time. Such data can be seen as a random collection of points on the manifold M on 
which measurements are taken (see section [5] for a precise definition of Poisson processes). Hence 
unlike the common deterministic setup the data do not belong to the same space as the ideal data 

Such inverse problems occur naturally in photonic imaging since photon count data are Poisson 
distributed for fundamental physical reasons. Examples include inverse problems in astronomy 
d], fluorescence microscopy, in particular 4Pi microscopy [37]) coherent X-ray imaging [T7], and 
positron emission tomography [5]. 
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In this paper we study a penalized likelihood or Tikhonov-type estimator 

e argmin [S [Gt] F {u)) + oR [u)] . (2) 

Here Gt describes the observed data, 5 is a Kullback-Leibler type data misfit functional derived 
in section [5J a > is a regularization parameter, and TZ : X ^ (—00,00] is a convex penalty 
term, which may incorporate a priori knowledge about the unknown solution . If S{gi;g2) = 
\\gi — 92\\y and TZ{u) = — mo||^ with Hilbert space norms || • \\x and || • \\y, then ([2]) is standard 
Tikhonov regularization. In many cases estimators of the form ^ can be interpreted as maximum 
a posteriori (MAP) estimators in a Bayesian framework, but our convergence analysis will follow 
a frequent paradigm, and in particular will be considered as a deterministic quantity. Our data 
misfit functional S will be convex in its second argument, so the minimization problem Q will be 
convex if F is linear. 

Recently, considerable progress has been achieved in the deterministic analysis of variational reg- 
ularization methods in Banach spaces [Sirn fTHllTMT^I^ . In particular, a number of papers have 
been devoted to the Kullback-Leibler divergence as data fidelity term in ^ , motivated by the case 
of Poisson data (see [2]l51 [TUl[Tlll23ll^ ). but all of them under deterministic error assumptions. On 
the statistical side, inverse problem for Poisson data have been studied by Antoniadis & Bigot [I] 
by wavelet Galerkin methods. Their study is restricted to linear operators with favorable map- 
ping properties in certain function spaces. Therefore, there is a need for a statistical convergence 
analysis for inverse problems with Poisson data involving more general and in particular nonlinear 
forward operators. This is the aim of the present paper. 

Our convergence analysis of the estimator ^ is based on two basic ingredients: The first is 
a a uniform concentration inequality for Poisson data (Theorem 12. ip , which will be formulated 
together with some basic properties of Poisson processes in Section [2l The proof of Theorem 12.11 
which is based on results by Reynaud-Bouret [2 5) is given in an appendix. The second ingredient, 
presented in Section [31 is a deterministic error analysis of ^ for general S under a variational 
source condition fTheorem l3.3p . Our main results are two estimates of the expected reconstruction 
error as the exposure time t tends to 00: For an a-priori choice of a, which requires knowledge of 
the smoothness of u\ such a result in shown in Theorem 14.31 Finally, a convergence rate result 
for a completely adaptive method, where a is chosen by a Lepskii-type balancing principle, is 
presented in Theorem 15. II 



2. Results on Poisson processes 



Let M C M'* be a submanifold where measurements are taken, and let {xi, . . . ,xn} C M denote 
the positions of the detected photons. Both the total number N of observed photons and the 
positions S M of the photons arc random, and it is physically evident that the following two 
properties hold true: 



1. For all measurable subsets IV 
has expectation E [G (M')] 
density. 



C 



[ the integer valued random variable G (P 



#{i I X, gM'} 



J^, dx where g^ G L^(M) denotes the underlying photon 



2. For any choice of m disjoint measurable subsets W. 
G {M[) , . . . , G (M^) are stochastically independent. 



c 



the random variables 



By definition, this means that G :— J2i=i a Poisson process with intensity g^. It follows 

from these properties that G(M') for any measurable M' C M is Poisson distributed with mean 
A := E[G(M)], i.e. P [G(M') ^ n] = exp(-A)^ for ah n e {0,1,...} (see e.g. [H Thm 1.11.8]). 
Moreover, for any measurable : O — )• M we have 



E 



[ VdG = [ i^g^'dx, Var / VdG = [ V^gMx 
Jm J Ju Um J Jm 



(3) 
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whenever the right hand sides are weU defined (see [H]). 

Let us introduce for each exposure time t > a, Poisson process Gt with intensity tg^ and define 
Gt jGt- We wiU study error estimates for approximate sohitions to the inverse problem ([T]) 
with data Gt in the limit t — > oo. For this end it will be necessary to derive estimates on the 
distribution of the log-likelihood functional 



S{Gt;g) :== [ gdx- [ IngdGt^ [ 5 da; - | V ln5i(a;,) , 
Jm Ju Ju ^ 



(4) 



which is defined for functions g fulfilling g > a.e. We set InO := —00, so S{Gt;g) = 00 if 
g{xi) = for some i = 1, . . . , N. Using ([3]) we obtain 

E[5(Gt;.g)] = / [g~gUn{g)]dx and Var[S {Gt;g)] = ^ f \n{gf gUx (5) 
Jm jm 

if the integrals exist. Moreover, we have 



E[5(Gt;.g)]-E[5(Gt;gt)] ^ f 

JM 



9-9 



7^ In 



dec, 



and the right hand side (if well-defined) is known as Kullhack-Leibler divergence 



{9' ■.9) 



{fft>0} 



g-g 



7^ In ■ 



dx. 



(6) 



KL (g^ ; 5) can be seen as the ideal data misfit functional if the exact data g^ were known. Since 
only Gt is given, we approximate KL (^g^; g) by S {Gt; g) up to the additive constant E \S [Gt] g^)] , 
which is independent of g. The error between the estimated and the ideal data misfit functional 
is given by 



|5(Gt;g)-E[5(Gt;5^)] -KL(,gt;g)| = ( \n{g) {dGt - g^ dx) 

JM 



(7) 



Based on results by Reynaud-Bouret 25 , which can be seen as an analogue to Talagrand's concen- 
tration inequalities for empirical processes, we will derive the following concentration inequality 
for such error terms in the appendix: 

Theorem 2.1. Let M C K'* be a bounded domain with Lipschitz boundary dD, R> 1 and s > |. 
Consider the ball 

B,{R):={0eH^M) I < i?} . 



Then there exists a constant Gconc > 1 depending only on M, s and IL^^II^iji^-) 



such that 



sup 



for all t > 1 and p > RGc, 



[ Q{dGt^gUx] 

JM 



< 



> 1 — exp — 



RGc, 



To apply this concentration inequality to the right hand side oi we would need that ln(F(u)) e 
Bs{R) for all M G *8. However, since supgg^^j^jj lifllioo < 00 by Sobolev's embedding theorem, zeros 
of F{u) for some u g *B would not be admissible, which is a quite restrictive assumption. Therefore, 
we use a shifted version of the data fidelity term with an offset parameter a > 0: 



S{Gt;g) := [ gdx- [ ln{g + a) {dGt + adx) 

JM JM 

T{g^;g) := KL (.gt + a; .g + a) 



(8) 
(9) 
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Then the error is given by 



Z(g) |5(Gt;g)~E[5(Gt;gt)] _r(.9^;5)| = / \n {g + a) {dGt ~ gUx) 

Ju 



(10) 



We win show in Section|3]that Theoreni l2.1l can be used to estimate the concentration of sup„gig Z{F{u)) 
under certain assumptions. 

3. A deterministic convergence rate result 

In this section we wih perform a convergence analysis for the method ([2]) with general S under a 
deterministic noise assumption. Similar results have been obtained by Flemming |10llll) , Grasmair 
|13) . and Bot & Hofmann 'ff under different assumptions on S. 

As in Section [5] we will consider the "distance" between the estimated and the ideal data misfit 
functional as noise level: 

Assumption 1. Let G 03 C A" be the exact solution and denote by :— F (m^) G y the exact 
data. Let y°^^ be a set containing all possible observations and g°^^ G y°^^ the observed data. 
Assume that: 

1. The exact data fidelity functional T : F (*B) x 3^ — > [0, oo] is non-negative, and T{g^ , g^) = 0. 

2. For the approximate data fidelity term S : F (*B) x 3^ — )■ [0, oo] there exist constants err > 



S (5°*^- g)-S (s""^- g^) > -^T {g^;g) - err (11) 



for allg G F(Q3). 



Example 3.1. • Classical deterministic noise model: If S {g;g) = T{g;g) = \\g — gWy, then 
we obtain from \a — b\^ > 2^~^a^ — b^ that (|lip holds true with Cerr = 2*"^^ and err = 
2 II — g°''^||-^. Thus Assumption[l\ covers the classical deterministic noise model. 

• Poisson data: For the case of S and T as in ^ and ^ it can be seen from elementary 
calculations that (jlip 



err>- [ \n {g^ + a) {dGt ~ g^ dx) + [ \n{F (u) + a) {dGt - g'' dx) (12) 

JM JM 

for all u € Consequently (|lip holds true with Can — ^ if err /2 is an upper bound for 
the integrals in (jlOp with g — F (u) ,u G *B. We will show that Theorem \2.1\ ensures that 
this holds true for err /2 — with probability > 1 — exp (— cp) for some constant c > (cf. 
Corollary \4. 2^. 

In a previous study of Newton-type methods for inverse problems with Poisson data pPT the 
authors had to use a slightly stronger assumption on the noise level involving a second inequality. 
pn Assumption 2] implies dTTI) with err = (1 + Cerr) sup„g(g err (g) provided this value is finite. 
On the other hand, PT|) allows that S {g°^^',g) = oo even if T {g^;g) < oo, which is impossible 
in [T71 Assumption 2] if err (g) < oo. 

To measure the smoothness of the unknown solution, we will use a source condition in the form of 
a variational inequality, which was introduced by Hofmann et al [15] for the case of a Holder-type 
source condition with index = and generalized in many recent publications [51 llO[[T^ll31ll6j . 
For their formulation we need the Bregman distance. For a subgradient u* G dTZ (m^) C X* (e.g. 
u* — ~ uq if TZ (u) — 1/2 \\u — uq\\^ with a Hilbert norm |1■|1;^^) the Bregman distance of TZ 
between u and w.r.t. u* is given by 

P^' (u,ii^) := 7^ (u) - 7^ -{u*,u-u''). 
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In the aforementioned example of TZ{u) = 1/2 ||m — uo\\-^ for a Hilbert space norm ||-||^ we have 
2?^ {u,u^) = 1/2 ||m — u^ll^. In this sense, the Bregman distance is a natural generalization of the 
norm. We will use the Bregman distance also to measure the error of our approximate solutions. 
Now we are able to formulate our assumption on the smoothness of u^: 

Assumption 2 (variational source condition). 7?. : A" — > (— cx),cx)] is a proper convex functional 
and there exist u* £ dTZ {u^) , a parameter P > and an index function (i.e. ip monotonically 
increasing, ip (Q) = j such that ip^ is concave and 

pvf {u,u'^) <TZ{u)-TZ{u^) +ip{T {g^\F{u))) forallue'S. (13) 

Example 3.2. Let ip be an index function, concave and F : ^ C X y Frechet differentiable 
between Hilbert spaces X and y with Frechet derivative F' [■]. Flemming Ull\l<^ has shown that 

-uo^i^ (f' [u^ * F' [u^] ) uj (14) 

together with the tangential cone condition [u^] {u — u^)\\y < vW^iu) — F {v)\\y implies the 
variational inequality 

/? ||u - u^\\^^ < \\u\\l - ||7.t||^ + [\\F{u) - gt|Q . (15) 

for all u £ Here ip^ is another index function depending on ij), and for the most important 
cases of Holder-type and logarithmic source conditions the implications 

i, (r) = r'^ ^ p^ (r) = w , (16a) 
V'(r) = -(ln(r))-P ^ (r) = /3 (- In (t))-^^ (16b) 

hold true with some constants 13, 13 where p > and v e (O, ^] (see Hofmann & Yamamoto |j6'l 
Prop. 6.6] and Flemming Sec. 13.5.2] respectively). 

With the notation ([TU)) of the error, we are able to perform a deterministic convergence analysis 
including an error decomposition. Following Grasmair |13j we use the Fenchel conjugate of (j) 
to bound the approximation error. Recall that the Fenchel conjugate (j)* of a function : M — > 
(—00,00] is given by 

(j)* (s) — sup (sT — (f> (r)) . 

(/)* is always convex as supremum over the afhne-linear (and hence convex) functions s 1— > st—(]>{t). 
Setting p>{t) := —00 for r < we obtain 

i^p=)* (s)^ sup {st + p{t)). (17) 

r>0 

This allows us to apply tools from convex analysis: For convex and continuous (f) Young's inequality 
holds true (see e.g. |91 eq. (4.1) and Prop. 5.1]), which states 

ST <(f> (r) + 0* (s) for aU s, r e K, 

(18) 

ST ^ (j) (t) + (jj* (s) <^ T£d(l){s). 

Moreover for convex and continuous (f> we have (ff* = (j) (see e.g. [5] Prop. 4.1]). 
Now we are in a position to prove our deterministic convergence rates result: 

Theorem 3.3. Suppose Assumptions]^ and\^ hold true and the Tikhonov functional has a global 
minimizer. Then we have the following assertions: 
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1. For all a > and err > we have 

(19) 



a 



2. Let err > 0. Then the infimum of the right hand side of (|19p it attained at a = a if and 
only if jj-^ G 9(— (^)(Ccrr err), and we have 



{u^, u^) < ^/CZ^ (err) . (20) 

Proof. (HI): By the definition of Ua we have 

S {g°'''; F{u^)) + a7^(u„) < S g^) + aTZ{u^). (21) 

It follows that 

Ass f2l 

? i (5 (5°"^^ 5^) - S (g°^^ F M)) + ^ (T (,g^;FM)) 
A.^.[i] err _ 1 ^ {g^;FM) + ^ (T {g^;F M)) 



err 

< h sup 



([2]): Using the fact that {—(p)** — —ip we obtain 



crr^ 



inf 

a>0 



sup [sCcrr err - (-(/?)* (s)] 



s<0 



(-(^)** (Corr err) = (/^(Ccrr err) < \/C^-ip (err) 



where we used the concavity of ip'^. By the conditions for equality in Young's inequality (I18L the 
supremum is attained at a = a if and only if G 9 {—(fi) (Ccrr err). ■ 

Remark 3.4. Since ip is assumed to he finite, we have d{—ip) (s) ^ for all s > (see e.g. 
Cor. 2.3 and Prop. 5.2]), i.e. the parameter choice (j25p is feasible. If p is differentiable, then 
d {~p) (s) = {—^' (s)} and (I25p is equivalent to a — 1/ {Ccrtp' {Cert err)). 



Example 3.5 (Classical case). Let F = T : X ^ y be a bounded linear operator between Hilbert 
spaces X and y. For S{g2',gi) ~ T{g2',gi) = WfJi ^ 92\\y and TZ{u) = — moII;^. we have 
Vlj^ [u,u^) = ||w- u^ll^, Ccrr = 2 and err = 2 "^g^ Moreover (O implies with 

ip = ip^. 

llff^ ~ 5°^'' 1 1 3; ^ ^ as mentioned in the introduction, then we obtain for an appropriate parameter 
choice ||wc( — = O ^■\/ 'P%ji {5^)^ ■ For the special examples of ijj given in (|16l) we obtain 

=0(5^) , = 0((-ln(5))-P) 

respectively, and these convergence rates are known to be of optimal order. 
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4. Convergence rates for Poisson data with a-priori parameter 
choice rule 

In this section we will combine Theorems 12.11 and 13.31 to obtain convergence rates for the method 
([2]) with Poisson data. We need the following properties of the operator F: 

Assumption 3 (Assumptions on the forward operator). Let X be a Banach space and ^ d X a 
bounded, closed and convex subset containing the exact solution to ([T|). Let M C R'^ a bounded 
domain with Lipschitz boundary dD. Assume moreover that the operator F : ^ ^ y := L^ (M) 
has the following properties: 

L F{u)>Q ffl.e. for all u G «B. 

2. There exists a Sobolev index s > | such that -F(S) is a bounded subset of LL' (M). 

Property ([T]) is natural since photon densities (or more generally intensities of Poisson processes) 
have to be non-negative. Property ^ is not restrictive for inverse problems since it corresponds 
to a smoothing property of F which is usually responsible for the ill-posedness of the underlying 
problem. 

Remark 4.1 (Discussion of Assumption [5]) . Let Assumption\^ and (jl4p hold true. Since we have 
the lower bound 

||<?-5ll£2(M) < Qllff + '^llL~(M) + ^ll5 + f^lli-(M))'^(5;ff) (22) 

with T as in ^ at hand (see JW), (115p obviously implies (jl3l) with T as in ^ and an index 
function differing from ip^ only by a multiplicative constant. 

Thus Assumption\^ is weaker than a spectral source condition. In particular, if F (u^"^ — on 
some parts o/M it may happen that (jl3p holds true with an index function better than if^^. 



Assumption [3] moreover allows us to prove the following corollary, which shows that Theorem 12.1 
applies for the integrals in ([1 



Corollary 4.2. Let Assumption\^hold true, set 

R := sup ||F(u)|l^,(M) ' 

and consider Z defined in pop with ct > 0. Then there exists Cconc ^ 1 depending only on M and 
s such that 



supZ(F(«))< 4 
ue<s Vt 



"""^y i?max{a-W-Mln(i?)|} Cconc j ^^^^ 



for allt>l,p> i?max{cr-W-Mln(i?)|} Cconc- 

Proof. W.l.o.g we may assume that R> 1. Due to Sobolev's embedding theorem and s > d/2 we 
have ||F(u)||l~(j„) < R\\Eoo\\ for all m G ». 

By an extension argument it can be seen from [35] that for M C M'' with Lipschitz boundary, 
g e H'' (M) n (M) and $ € C^^i+^ (R) one h&s ^ o g £ H'' (M) and 

II* ° 9\\hhm) < C ||$||cL.j+i(R) ll3llff=(M) (24) 

with C > independent of $ and g. To apply this result, we first extend the function x !—>■ 
In {x + ct) from [0, R H-Eoo ||] (since we have < F {u) < R ||i?oo|| a.e.) to a function $ on the whole 
real line such that $ G CL^J+^ (R). Then for any fixed u G «B we obtain $ o F (u + ct) G iJ* (M) 
and since $||^ ^ie^h] (') ^ ln(- + ct) and < F (u) < R \\Eoo\\ a.e., we have $ o (F (u) + ct) = 
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In (F (u) + a) a.e. Since all derivatives up to order [sj + 1 of a; i— In (a; + a) and hence of <I> on 
[0, R \ \Eao\\] can be bounded by some constant of order max {cr"W~\ln(i?||£'oo||)}, the extension 
and composition procedure described above is bounded, i.e. there exists by a constant C > 
independent of u, R and a such that 

||ln (F (u) + a)ll^.(j^) < Cmax {a" in (i?)} R 

for all w € *B. Now the assertion follows from Theorem 12. II ■ 

Now we are able to present our first main result for Poisson data: 

Theorem 4.3. Let the Assumptions [H with T defined in Q and Assumption be satisfied. 
Moreover, suppose that ^ with S in ^ has a global minimizer. If we choose the regularization 
parameter a = a{t) such that 

1 / 1 \ 

(25) 



then we obtain the convergence rate 



E 



1 



t — oo. 



Proof. First note that Assumption [T] holds true with Ccn- — 1 whenever the bound err fulfills ((T^ . 
By Corollarv 14.21 the right-hand side of (TT^ is bounded by 2^ with probability greater or equal 
1 — exp (— cp) with p > 1/c, 

c= (i?max{a-W-Mln(i?)|Ceo„c})"' , 

and Cconc as in Corollarv 14. 21 Now let pk :— c^^k, fc e N and consider the events 

Pk 



Eh 



supZiFiu))< 
lie's yt ) 



fc e N 



with Z as defined in ([TU| . Corollarv 14. 21 implies 

P[i?^] <exp(-A:) 

and on Ek Assumption [1] holds true with Ccn- = 1 and err = 2sup„gfg Z {F [u)) < 2pk/Vt. Thus 
Theorem 13. 3p ]) implies 



2pk 

OL\ft 



< 



2pk 



Y^/t 



for all A; e N and a > 0. According to Theorem I3.3l( 2|) the infimum of the right hand side is 
attained at a defined in and 

niaxX>^' < ^</7 ^-i= 



for aU fc e N with C (k) ~ ^fc. Now we obtain 



E 



k=l 



Ek\E. 



k-l 



< V P [Sfe \ Ek-i] maxV^ 



k=l 



k=2 



/3 ^\Vt 



^l^^' (^f;exp(-(fc-l))fc^^(^i= 
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The sum converges and the proof is complete. 



5. A LepskiT-type parameter choice rule 

Usually the parameter choice rule (1251) is not implementable since it requires a priori knowledge of 
the function <p characterizing the smoothness of the unknown solution . To adapt to unknown 
smoothness of the solution, a posteriori parameter choice rules have to be used. In a deterministic 
context the most widely used such rule is discrepancy principle. However, in our context is 
not applicable in an obvious way since S approximates T only up to the unknown constant 
E[5(Gt;5t)]. 

In the following we will describe and analyse the Lepski principle as described and analyzed in 
the context of inverse problems by Mathe and Pereverzev [^niHI] . Lepskii's balancing principle 
requires a metric on X, and hence we assume in the following that there exists a constant Cbd > 
and a number q > I such that 



u' II 



X 



<CbdV^ (u^u'f) for all it £ (26) 



This is fulfilled trivially with q — 2 and Cbd = 1 if <-f is a Hilbert space and TZ {u) — || u — mo||^ 
(then we have equality in (OS])). Moreover for a g-convex Banach space X and TZ (u) = the 
estimate (|26p is valid (see [31]). Besides this special cases of norm powers, ((26)) can be fulfilled 
for other choices ofTZ. E.g. for maximum entropy regularization, i.e. TZ{u) = uln(u) da;, the 
Bregman distance coincides with the KuUback-Leibler divergence, and we have seen in Remark 
14.11 that (pS)) holds true in this situation. 

The deterministic convergence analysis from Section [3] already provides an error decomposition. 
Assuming P > 1/2, Theorem 13. 3t [T]) together with (^51) states that 

Ih" ~ "^lU - ^ (-^app (a) + /noi (a)) for ah a>0 (27) 

with the approximation error /fpp (a) and the propagated data noise error f^^-^ (a) defined by 

/app(«) :=2 (^2Cbd(-'/^)* (^-^^y and (a) 2 (2Cbd^) ' ■ (28) 

Here the constant 2 in front of Cbd is an estimate of 1//3. For the error decomposition ((27)) it is 
important to note that /app is typically unknown, whereas /noi is known if the upper bound err is 
available. But due to Corollarv 14 . 2 1 the error is bounded by p/^/t with probability 1 — exp (— cp). 
This observation is fundamental in the proof of the following theorem: 

Theorem 5.1. Let Assumptions\^ and\^ with (3 G [51°°) '^^'^ ^ '^'^'^ T as in ^ and ^ be 

fulfilled and suppose ()26p holds true. Suppose that ^ has a global minimizer and let a > 0, r > 1, 
R := sup„g(g ll-F (M)|jj:^s('j4N < 00 and T > jRma.x 

{a-W-Mln(i?)|}Cco„c. Define the sequence 



aj := -^r^^ ^, j G N. 

Vt 

Then with m := min {j e N | aj > the choice 

{I II II - ^-^^ ') 

j e {1, ...,7Ti} I ||uq. -Ua^ll < 4(4Cbd)' J' ' foralli<jj (29) 



yields 

E 



= 0[ip[ ) ) as t -> 00. 
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Proof. If f > exp(4), the assumptions of Corollary 14.21 are fulfilled with p{t) := T\n{t). Then 
with Z as in PU)) the event 

A, :=(supZ(F(u))<^ 
lues Vt 



has probability P [A'^] < exp {-cp (i)) with c = (i?max {cr" |ln {R)\} Cc 
Moreover, as we have seen in (j27p . on Ap the error decomposition 



by Corollary 



holds true with 



and 4> = /app as in 



2(4Cbd)'r^ 



Note that 2ip {i) corresponds to the required bound for Wq. 



in ((29)) . The function is obviously non- increasing and fulfiUs i> (j) < r^/? (j + 1) and it can be 
seen by elementary computations that is monotonically increasing. Now |20[ Cor. 1] implies the 
so-called oracle inequality 



max 

A, 



X 



< iri min {(j) (j) + ip (j) \ j G {1, ...,to}} . 



By inserting the definitions of (j) and we find 



max 

A, 



< 4rn2''Cbd min I (-ipY 



Pit) 



(30) 



and obviously the minimum over ai,...,am can be replaced up to some constant depending 
only on r by the infimum over a > ai if t is sufficiently large. By Theorem I3.3tp|) the sum 
{—(fi)* ( — 1/a) + p{t) / (-v/ia) attains its minimum over a € (0, oo) at a = aopt if and only if 
l/ofopt G {p{t) / Vt) ■ Note that p{t) j^ft — ol\. By elementary arguments from convex 

analysis we find using the concavity of 93 that 



< -inf (ai) 

aopt ^No 



V(a\) -ip{ai- h) ^ If (ai) - (p (s) 



h 



ai 



for aU < s < ai. Thus choosing s — shows that ai/aopt < {c(i) — v{p{t) /Vt) for all 
t > 0. As the right-hand side decays to as t — >■ 00, we have ai < ctopt for t sufficiently large. 
Therefore, the minimum in (1301) can indeed be replaced (up to some constant) by the infimum 
over all a > (see Lem. 3.42] for details). Defining diam(*8) := sup„ ^^(^ \\u — v\\^ which is 
finite by Assumption [3] we find from ((30|) and Theorem 13.31 that 



E 



ill' 
^ \\x 



< P [Ap] max 



X 



p[a; 



< Cip 



+ exp {-cp (t)) diam (S)"* 



with some constant C > 0. Due to the definition of p, 2tc > ^, In (i) > 1 and \n{t) /^/t < 1 we 
obtain 

/ 1 /ln(t)\'"' /in (t) 1 /ln{t)' 



using the concavity of ip'^ . This proves the assertion. 
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Note that the constants R and Cconc - which are necessary to ensure a proper choice of the 
sequence aj and hence for the implementation of this Lepskii-type balancing principle - can be 
calculated in principle (assuming e.g. the scaling condition IIs^IIli^jj-) — !)• Thus Theorem 15.11 
yields convergence rates in expectation for a completely adaptive algorithm. 

Comparing the rates in Theorems 14.31 and 15.11 note that we have to pay a logarithmic factor for 
adaptation to unknown smoothness by the Lcpskii principle. It is known (see ^28J ) that in some 
cases the loss of such a logarithmic factor is inevitable. 



A. Proof of Theorem [27T] 



In this section we will prove the uniform concentration inequality stated in Theorem l2.1l Our result 
is based on the work of Reynaud-Bouret who proved the following concentration inequality: 

Lemma A.l ( [25[ Corollary 2]). Let N be a Poisson process with finite mean measure v. Let 
{fa}aeA countable family of functions with values in [—b,b] and define 



Z:=sup / fa{x){dN-dv) 
Then for all positive numbers p and e it holds 



and vo := sup / (x) dv. 

aeA Jm 



Z > {l + e)E[Z] + yJVlvQp + K (e) bp < exp (-p) 
where k (e) — 5/4 + 32/e. 

We will use a denseness argument to apply Lemma lA.ll to tZ with 



Z := sup 



1 (x) ( dGt - Ax) 



The properties derived in the following lemma will be sufficient to bound E 

Lemma A. 2. Let M C be a bounded domain with Lipschitz boundary, R > and suppose 
s > ^. Then there exists a countable family of real-valued functions : j £ J^}, numbers 7j, 
j G i7 and constants Ci , C2 > depending only on s and M such that 



(31) 



j6J 



and for all q £ Bs (i?) there exists real numbers /3j,j € such that 



= E/3j</>j and 



< clR^ . 



(32) 



Proof. Choose some k > such that M C (— k, k) . Then there exists a continuous extension op- 
erator E : (M) — > Hq (J—k, k]'^^ (see e.g. 'SO, Cor. 5.1]). Consider the following orthonormal 
bases {lpj -.j el} of L2([-k, k]) and {0j : j G Z"*} of L2([-k, kY): 



^ sin (TTjx/K) , j > 0, 
^j{x):=^{l/V2, j=0, 
cos (ttjx/k) , j < 0, 



xi) 



1=1 
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1/2 

We introduce the norm ||0|1h^^^ = {J2jei.'^(^ + (fli </'j) ^-^d the periodic Sobolev space 

H^^^{[-K,Kf) := {g e L^{[-k,k]'^) \ UWh;^^ < oo}. The embedding J : H^(^[-K,Kf'^ ^ 

ifpgj. ^[—k,k\'^^ is well defined and continuous as the norms of both spaces are equivalent (see 
e.g. [201 Exercise 1.13]), so the extension operator 

is continuous. In particular, 

i^ext {B, (R)) C {fl e {[-K, k]'') I ll0llH.^,^([_,,,]<i) < C2i?} with C2 := ||i?ext|| 



and (|32)) holds true with /3j 
j e Z'^ we obtain 



and 7j (1 + |j| ) ■ Moreover 



as ||0?||oo < K ''for ah 



V 7j / 0^5^ da; < ci / dx with ci := V (l + |j 

and majorization of the sum by an integral shows that ci < oo as s > d/2. Therefore, (1311) holds 
true, and the proof is complete. ■ 

Lemma A. 3. Under the assumptions of Lemma \A.2\ we have 



E 



Z 



< 



Proof. With the help of Lemma I A. 2 1 we can now insert ^i2\ and apply Holder's inequality for sums 
to find 



Z < 



sup 



f 7j0j(dG,-5M.) 



< C2R 



\ 



{dGt - dx) 



where we used that the functions (jjj are real-valued. Hence by Jensen's inequality 

< C2R, 



E 



< WE 



Z2 



\1 



0j (dGt - g^ dx) 



M 



(33) 



E 



Using ^ we obtain 

0j (dGt - g^ dx) 

t 

and plugging this into (I33p and using pil) we obtain 
Z 









)] 













f 0j {t dGt - tg^ dx)) =- ( 0?.gt dx 



E 



C1C2R 



Vt 



L'(R 
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Proof of Theorem \2.1[ By Sobolev's embedding theorem the embedding operator Eoo ■ H" (M) ^ 
L°° (M) is weU defined and continuous, so 



|0|!l° 



<R\\E^ 



for all e Bs (R) 



(34) 



Now we choose a countable subset {ga}aeA {R) which is dense in Bg {R) w.r.t. the H'*-norm, 
and hence also the L°°-norm and set N = tGt and = tg^ dx in Lemma FA. II to obtain 



Z > (1 +e)E 



V12wo/5 , (e) ll-Eooll Rp 



< exp (-p) 



(35) 



for all p > 0. Choosing e = 1 and using Lemma lA.31 and the simple estimate 

vo < tR^ \\E^\f II^Il^m) ' 

yields 

> 1 - exp {-p) 



Z < 



(36) 



forallp,t > OwithCi := 2^C2j\\gm^i^^y C2 := VT2||£;oo|| ^llff^llL^M) ^nd (32 + |) ||K 



li t,p> 1, we have j < and y/p < p 



Z <iCi+C2 + C3) ^ 



(M) 



> 1 — cxp (— p) for p, t > 1 



Setting Cc, 



max{Ci + C2 + C3, 1} and p :— pRC cone this shows the assertion. 
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